16 research outputs found

    Pattern Mining for Named Entity Recognition

    Get PDF
    International audienceMany evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our re-search team has developed CasEN, a symbolic system based on finite state tran-ducers, which achieved promising results during the Ester2 French-speaking eval-uation campaign. Despite these encouraging results, manually extending the cov-erage of such a hand-crafted system is a difficult task. In this paper, we present a novel approach based on pattern mining for NER and to supplement our sys-tem's knowledge base. The system, mXS, exhaustively searches for hierarchical sequential patterns, that aim at detecting Named Entity boundaries. We assess their efficiency by using such patterns in a standalone mode and in combination with our existing system

    Preface

    No full text

    Extending tree automata to model XML validation under element and attribute constraints

    No full text
    Abstract: Algorithms for validation play a crucial role in the use of XML. Although much effort has been made for formalizing the treatment of elements, attributes have been neglected. This paper presents a validation model for XML documents that takes into account element and attribute constraints imposed by a given DTD. Our main contribution is the introduction of a new formalism to deal with both kinds of constraints. To this end we propose an extension of regular tree automata that allows the construction of a deterministic automaton having the same expression power as that of a DTD. Our formalism gives rise to an efficient validation method.

    Schema evolution for XML: A consistency-preserving approach

    No full text
    Abstract. This paper deals with updates of XML documents that satisfy a given schema, e.g., a DTD. In this context, when a given update violates the schema, it might be the case that this update is accepted, thus implying to change the schema. Our method is intended to be used by a data administrator who is an expert in the domain of application of the database, but who is not required to be a computer science expert. Our approach consists in proposing different schema options that are derived from the original one. The method is consistency-preserving: documents valid with respect to the original schema remain valid. The schema evolution is implemented by an algorithm (called GREC) that performs changes on the graph of a finite state automaton and that generates regular expressions for the modified graphs. Each regular expression proposed by GREC is a choice of schema given to the administrator.

    Efficient schema-based revalidation of XML

    No full text
    Abstract. As XML schemas evolve over time or as applications are integrated, it is sometimes necessary to validate an XML document known to conform to one schema with respect to another schema. More generally, XML documents known to conform to a schema may be modified, and then, require validation with respect to another schema. Recently, solutions have been proposed for incremental validation of XML documents. These solutions assume that the initial schema to which a document conforms and the final schema with which it must be validated after modifications are the same. Moreover, they assume that the input document may be preprocessed, which in certain situations, may be computationally and memory intensive. In this paper, we describe how knowledge of conformance to an XML Schema (or DTD) may be used to determine conformance to another XML Schema (or DTD) efficiently. We examine both the situation where an XML document is modified before it is to be revalidated and the situation where it is unmodified.

    Efficient Incremental Validation of XML Documents after Composite Updates

    No full text
    We describe an e#cient method for the incremental validation of XML documents after composite updates. We introduce the class of Bounded-Edit (BE) DTDs and XML Schemas, and give a simple incremental revalidation algorithm that yields optimal performance for them, in the sense that its time complexity is linear in the number of operations in the update. We give extensive experimental results showing that our algorithm exhibits excellent scalability. Finally, we provide a statistical analysis of over 250 DTDs and XML Schema specifications found on the Web, showing that over 99% of them are in fact in BE
    corecore